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Abstract 



In this paper a two-level regression model is imposed on the ability parameters in an 
IRT model. The advantage of using latent rather than observed scores as dependent variables 
of a multi-level model is that this offers the possibility of separating the influence of item 
difficulty and ability level and modeling response variation and measurement error. Another 
advantage is that, contrary to observed scores, latent scores are test-independent, which offers 
the possibility of entering results from different tests in one analysis. Further, it will be shown 

j 

that also problems of measurement error in covariates in multilevel models can be solved in the 
framework of IRT-multilevel modeling. In this paper, the two-parameter normal ogive model 
will be used for the IRT measurement model. It will be shown that the parameters of the two- 
parameter normal ogive model and the multilevel model can be simultaneously estimated in 
a Bayesian framework using Gibbs sampling. Various examples using simulated data will be 
given. 

Key words: Bayes estimates, Gibbs sampler, item response theory, Markov chain 
Monte Carlo, multi-level model, two-parameter normal ogive model. 
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In educational and social research, there is a growing interest in the problems 
associated with describing the relations between variables of different aggregation level. In 
school effectiveness research, one may, for instance, be interested in the effects of the school 
budget on the educational achievement of the students. However, the former variable is defined 
on the school level while the latter variable is defined on the level of students. This gives 
rise to problems of properly modeling dependencies between variables. These problems can 
be coped with by multilevel models (Bryk & Raudenbush, 1992; De Leeuw & Kreft, 1986; 
Goldstein, 1987; Longford, 1993; Raudenbush, 1988). In the above example, students are 
nested in schools, and in a multilevel model the students would make up a first level and 
the schools a secondary level. Although most applications of the multi-level paradigm are 
found in regression and analysis of variance models (see, for instance Bryk & Raudenbush, 
1992), multi-level modeling does, in principle, apply to all statistical modeling of data where 
elementary units are nested within aggregates. Longford (1993), for instance, gives examples 
of multi-level factor analytical models and generalized linear models. Also in the field of item 
response theory some applications of the multi-level paradigm can be found (see, Adams et al., 
1997; Mislevy & Bock, 1989) . 

In the present paper, the following problem related to multilevel modeling is studied. 
In educational research, many variables are measured subject to error. This does predominately 
concern the dependent variables, but also covariates on the student and school level can be 
subject to measurement error. In practice, the multilevel models used belong to the framework 
of the usual linear multivariate normal model and solutions to the problem of measurement 
error boil down to applications of classical test theory (see, Longford, 1993, 1998). One of the 
drawbacks of classical test theory is that measurement error is supposed to be independent 
of the score level of the testee. In modem test theory, i.e. item response theory (IRT), 
measurement error is defined conditionally on the value of the latent ability variable. Therefore, 
it seems worthwhile to tackle the problem of measurement error in multilevel models in the 
framework of hierarchical IRT models. 

This paper consists of six sections. After the introduction section, a general multi- 
level-IRT model will be presented. In the next section, a Markov chain Monte Carlo (MCMC) 
estimation procedure will be described. Then, the model will be generalized further to allow for 
measurement errors on the predictor variables and the estimation procedure will be generalized 
to allow for this kind of measurement error. In Section 5, examples of the procedure will be 
iven. And finally, the last section contains a discussion and suggestions for further research. 
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Multi-level IRT models 



One-way random effects IRT ANOVA 

Consider a population of units, say schools, from which a sample of units indexed 
j = 1 , ... J is drawn. Individuals, say students indexed i = 1 , . . . rij 9 are nested within units. 
In this framework, Bryk & Raudenbush (1992) consider a two-level one-way random effects 
ANOVA model. For the first level, the model is given by 

Yu = P j + e ijt with ey ~ N( 0, a 2 ), (1) 

the second level is given by 

(3j : = 7 + Uj, with Uj ~ iV(0,r 2 ). (2) 



So the model entails that the level one unit means are sampled from a normal distribution with 

\ 

mean 7 and variance r 2 . Persons within a unit are independent and the disturbances of the 
regression coefficients in different schools are uncorrelated. This model can be generalized to 
an IRT-framework by imposing the linear structure on unobserved latent variables 6ij rather 
than on observed variables Yij. The assumption is introduced that unidimensional ability 
parameters are independent and normally distributed given (3 ^ So let 6ij \ fij ~ A f (/3 j , cr 2 )< 
Further, /? • ~ AT ( 7 , r 2 ). Combining these two assumptions results in 
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Without conditioning on group membership the ability parameters of the respondents are 
dependent. To see this, consider the transformation 
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Then it follows that 
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(5) 



So over groups, the ability parameters of the respondents are dependent. The ability parameters 
are linked to observed dichotomous responses y^, k = 1, . . . , K. Let y i3 be the response 
pattern of person i in group j, and let Y be the data matrix. One of the major estimation 
procedures in IRT is marginal maximum likelihood (MML, Bock & Aitkin, 1981; Mislevy, 
1986). The impact of the above dependency structure on an MML estimation procedure can 
be assessed by inspection of a likelihood function marginalized over all random effects. This 
likelihood function can be written as 



L(l, cr 2 ,T; Y) 



n / />•••>/ i e v)9( e v i &v a 

j *b 

n / n / p( y v 1 9(0*1 1 



) h (Pj I 'y,T)d0 lj ,...,d0 nj] dP J 
h (Pj I 7. T)dPj, (6) 



where p(y i; \ 0 i3 ) is the IRT model specifying the probability of observing response pattern 
yij as a function of 0i 3 , g(0i 3 | P^cr 2 ) is the density of 0 i3 and h(fi 3 | 7 , r) is the density of 
P 3 . It can be seen that the dependency structure results in nesting of integrations that might 
complicate an MML estimation procedure. Therefore, below an alternative approach will be 
studied. But first the model will be generalized further. 



A Multi-level IRT model 

Bryk & Raudenbush (1992) present the above one-way random effects AN OVA model 
as a special case of a general model given by 

Yij = f3 0j + ... + P q jX q ij + . . . + PqjXchj + eij, with e i} ~ iV(0, cr 2 ), and (7) 
Pqj = 7 9 0 + ■ ■ ■ + IqsWsqj + ■ ■ ■ + 7 qS^Sqj + U qj, for q = 0, . . . , Q. (8) 

Let Uj be a vector with elements u q3 , q = 0, . . . , Q, which has a normal distribution 
with mean zero and covariance matrix equal to T, that is, u 3 ~ N( 0,T). In (7), X qij and 

0 

ERJC 

imiMiffaHaaaa 



7 



Multi-level IRT 6 



P qj are level one predictor variables and regression coefficients, respectively. The latter are 
assumed to be random variables modeled by (8), where W sqj and j qs are level two predictor 
variables and regression coefficients, respectively. 

In an IRT context this model translates to a structural model defined by 

= Poj + • • • + PqjXqij + + P Qj XQij + e ij) with eij ~ N( 0, a 2 ), (9) 

with the distribution of P qji q — 0, . . . , Q as defined in (8). 

Below, it will prove convenient to write the model in matrix notation. Let X ; - represent 
the matrix with explanatory variables for the n 3 pupils on school j, j — 1,.. . J, that is, 
Xj = (Xij, . . . , X n .^ and X^ = (Xotj, . . . , Xq^) 1 . Consider the block diagonal matrix 
X with ( rij x (Q + 1)) blocks X r This matrix can be written as {X^} ® I j, where ® signifies 
the direct product. So X is an (n! + . . . + nj — N) x ( J (Q + 1)) block diagonal matrix, with 
the Xi, . . . , Xj as the diagonal blocks. Further, 6j = (0^, . . . , 0 nj jY, the ability parameters of 
the pupils of school j, and ej can be stacked as 6 — {0j}®l N ande = {ej}®l N , where 1 N is 
a column vector in R N with 1 in every component. In the same way, fi 3 — [P Qjy . . . , p Qj Y are 
the regression coefficients for the level one model for the ability parameters of the pupils of 
school j, and the J (Q + l)-vectors / 3 can be defined as f3 — {Pj} <8> 1 j. Then (9) can be 
written as 



6 = X/3 + e, 



( 10 ) 



with E (e) = 0, and E ( ee l ) = a 2 l N . 

The matrices W qj — (Wo q j y . . . , Ws q j) 1 contain the explanatory variables for the 
regression coefficient p qj . Define W ; = {W w -} 0 l Q+l and W is the ( J (Q + 1)) x 
{{Q + 1 ){S+ 1)) matrix W = {W ; } ® 1 j. Further, define u = {u ; } ® 1 j and T — 
{7,} ® 1q+i, with iij = (u 0j , .... UQjf and ■y q = (y g0 , y gS )\ respectively. Then (8) 
can be written as 

/3-WT + u, (11) 

with E (u) = 0, E (urf) = Ij ® T — T. T is a block diagonal matrix with J blocks T. 

In the above formulation the coefficients of all the predictors in the level 1 model are 
treated as random, that is, as varying across level 2 units. In certain applications, it can be 
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desirable to constrain the effects of one or more of the level 1 predictors to be identical across 
level 2 units. This is accomplished by reformulating the hierarchical model as a mixed model 
(Raudenbush, 1988). However, the issues and procedures discussed below also apply to mixed 
model settings. 

Up to this point, the ability parameter 0 is unspecified and unknown. In the next 
section, an IRT model and an estimation will be introduced. 

A MCMC estimation procedure for a multilevel IRT model 

Recently, Albert (1992) derived a procedure for simulating sampling from the 
posterior distribution of the item and person parameters of the two-parameter normal ogive 
model using the Gibbs sampler (Gelfand et al., 1990; Gelman et al., 1995; Geman & Geman, 
1984) . In this paper, this approach will be generalized to the multilevel IRT model considered 
above. In the normal ogive model, the probability of a correct response of a person indexed ij 
on an item indexed /c, Y{j k = 1, is given by 



P ( Yijk = 1 | a ki 6 k ) = 4> (a k 6 {j - S k ) , (12) 

where 3> denotes the standard normal cumulative distribution function, and a k and 6 k are the 
discrimination and difficulty parameter of item k, respectively. Below, the parameters of item 
k will also be denoted by £ k = (a*, 8 k ) 1 . 

As can be seen from (6), making inferences about the parameters of the multilevel 
IRT model in an MML framework entails integrating over high dimensional probability 
distributions. By drawing samples from these distributions, sample averages can be computed 
to approximate expectations. Unfortunately, no procedure is known to obtain the required 
samples directly. Therefore, a Bayesian perspective where all parameters are viewed as random 
variables will be adopted and a Markov chain Monte Carlo (MCMC) procedure will be used for 
evaluating the posterior distributions of the parameters. The MCMC chains will be constructed 
using the Gibbs sampler. 

Gibbs sampling proceeds as follows. Divide the vector u> into n components, u = 
(cji, . . . , cj n ). In each iteration of the Gibbs sampler each component will be drawn conditional 
on previously drawn values of all the others. So at each iteration m, each cj™ is sampled from 




9 



the conditional distribution given all the other components of to 
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pKK'.y), 

with u_ k = (tu™, . . . . . . , wJJ 1-1 ). In this way each component io k is updated 

conditionally on the latest values of u> for the other components. The idea is to construct the 
model using a sequence of conditional probability distributions, and apply the Gibbs sampler 
to obtain samples from the posterior (target) distribution. 

To implement the. Gibbs sampler for the normal ogive model, Albert (1992) augments 
the data by introducing independent random variables Z ijk , which are assumed to be normally 
distributed with mean a k 6ij - 6 k and variance equal to one. It is assumed that Y ljk = 1 if 
Z ijk > 0 and Y ijk = 0 otherwise. Though the joint distribution of (Z, 9, £) has an intractable 
form, the fully conditional distribution of each of the three parameters are easy to simulate. 
So each iteration m consists of three steps: (1) draw Z m+1 from its distribution given 
and 9 m ,( 2) draw 9 m+l from its distribution given Z m+1 and £ m , and (3) draw £ m+1 from 
its distribution given Z m+1 and 9 m+l . In the next section it will be shown that this idea can 
be extended to simultaneously estimating the posterior distribution of all parameters in the 
multilevel IRT model. 



Estimation of the Multilevel IRT Model Using Gibbs Sampling 

To implement the Gibbs sampler a vector of independent random variables Z = 
(Zin,. .. ,Z njJK ) is introduced, where Z ljk is distributed as defined above. With the 
introduction of Z the joint posterior distribution of the parameters of the multilevel model 
and the normal ogive model (Z, 0, £ , /3 ,<t 2 , T, T | Y) is given by 



p(Z,0,£,/3,cr 2 ,r,T | Y, X, W) oc Hlj ((]W^ I P I fy,* 2 ,**)) 

pfa |r,T,w.)p(r|T) 

p(Op(ct 2 )p(T) , (13) 



with 



P ( Z ijk I Oij, £ k ,y ijk ) oc 4> (Z ijk - a k 9ij - S k , 1) [I {Z tjk >0)1 {y ijk = 1) 

+ I(Z ijk <0)I(y ijk = 0)). 
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A vague prior p(£) — n^=i I ( a fc > 0) is chosen for the item parameters to insure that each 
item will have a positive discrimination index. The other priors will be discussed below. The 
distribution (13) has an intractable form and will be very difficult to simulate. Therefore, 
a Gibbs sampling algorithm will be used where the three steps of the original algorithm by 
Albert (1992) are replaced by seven steps. Each step consist of sampling from the posterior of 
one of the seven parameter vectors Z , 0, £, /3, cr 2 , T, T conditionally on all other parameters. 
These fully conditional distributions are each tractable and easy to simulate. So the remaining 
problem is finding the conditional distributions of Z, 0, £, /3, T, a 2 and T, respectively. 

Step 1: Sampling Z. Given the parameters 0 and £, the variables Z l3 k are independent, and 






jk 



Y distributed 



{ 



N (akOij — Ski 1) truncated at the left by 0 if Y i3 k — 1 
N {ctk^ij — 1) truncated at the right by 0 ifYijk = 0. 



(14) 



Step 2: Sampling 0. The ability parameters are independent given Z,£,/3 and cr 2 . Using 
equation (10) and (14) it follows that 



p(Pij I a p(Z ij \0i j ,t)p(d ij \(3 j ,(T 2 ,Xi j ) 

K 



cx exp 



— ^ (%ijk + $k ~ &k0ij) 2 
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oc exp 


2^ ( 0lJ “ M 


exp 



exp 









(15) 



with 



7 ) _ ^2k - 1 {Zijk + fik) 

o 



and v = a kj • Inspection shows that (15) is a normal model for the regression of 

Zijk + Sk on ctk with 0 i3 as a regression coefficient, where 0^, has a normal prior parameterized 
by fi 3 and cr 2 (e.g., see. Box & Tiao, 1973; Lindley & Smith, 1972). So the fully conditional 
posterior density of 0 i3 is given by 



% | Z tJ ,£,f3 v a 2 ~N 



( %/ v + X.j/3,7 cr 2 1 \ 

l 1/v + l/cr 2 'l/v+l/a 2 J 



( 16 ) 
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Step 3: Sampling £. Conditional on 9, Z k = (Z nk , . . . , Z rillk , . . . , Z nj j k ) 1 satisfies the linear 
model 



Z k =[9 -1 ]€fc + e*, (17) 

where e k = (en/c,- • • ,£ njJkY is a random sample from N (0, 1). Combining (17) with the 
prior for p (£ ) = n^=i I > 0), it follows that 

j n j 

zm a nn (j) (X k 0{j 1 ) P 

j—l z=l 

= exp (Z k - HCfc)* (Z fc - H&)) p (&) 

with H = [ 6 — 1 ] . Therefore, 

I 0, Z k ~ iV (£*, (H'H)' 1 ) / (a* > 0) , (18) 

where £ k is the usual least square estimator based on (17). 

Step 4: Sampling (3. The fully conditional posterior density of f3j is given by 

P (Pj I 9j,a 2 ,T,T) oc p{9 } ,| | I\T) 

“ 

exp (j- (0, - wjj'r 1 (3, - WTij 

with 0^ = (XjXj) 1 y^jOj. Notice that the fully conditional posterior of f3j entails a model for 
the regression of 0j on Xj, with f3j as regression coefficients, where the regression coefficients 
have a normal prior induced by the level 2 model (11), that is, the regression of (3j on Wj. 

Define E j = a 2 (X t j X i )~\d = 'Zj%+T- 1 W j r andD = (Sj 1 + T" 1 ) -1 . Then 
it follows that 

Pi |0 j> <x 2 ,r,T~7V(Dd,D). (19) 
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Step 5: Sampling I\ The matrix F is the matrix of regression coefficients for the regression of 
P 3 on W j. The unbiased estimator for F will be the generalized least square estimator. Because 



j 

(r I t) a | r,T)p(r | t) 



1 



a ex P ( 4 E (Pi - W T)‘ T - 1 (Pj - w,r) ) , 



j=l 



using an improper noninformative prior density for F it follows that 



-l 



^,T' 



N 



J2wp- l W } ) ^W'T- 1 ^, 



( 20 ) 



j=l 



<j = 1 



Step 6: Sampling cr 2 . The conjugated prior density for the variance a 2 is the Inv — y 2 (vq, co- 
upon setting vq = 0, it follows that the noninformative prior density for the variance is 
p (cr 2 ) a cr -2 . Then the conditional posterior distribution for cr 2 is given by 



p(a 2 \0,P) oc p{e | p,a 2 )p(a 2 ) 



<x 



(<r 2 )" (?+l) exp (^S 2 ) , 



with S 2 = jf ZU (°i ~ XjPiY (°i ~ XiPj), thus 



<7 2 |0,/3~/nu-x 2 (iV,S 2 ). (21) 

The prior density for the variance cr 2 is improper, but yields a proper conditional posterior 
density for a 2 . 

Step 7: Sampling T. Above, W ; and P 3 are defined as the matrix of explanatory variables 
and the vector of regression coefficients for school j, respectively. The level 2 model for this 
school can be written as P- — W ; T -f ix, ,with E (u 3 ) — 0, E (u^u*) = T. Therefore, 

v (t | /3,, r) .oc p(Pj | r,T)p(T) 

cx |T|-5 exp ( '-i (P 3 - W,r ) l T-t {p } - W,T)) p(T) . 
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Assuming a non-informative prior for T, aggregating over schools results in 




|TPexp (-ifr (ST- x ))p(T) 
|T|-^ +1 )exp (-Itr (ST- 1 )), 



with 



j 



s = £(^- w ’ r ) (^-w.r) 4 



and thus, 



T | /3, F ~ inv — Wishart ( J, S l ) . 



( 22 ) 



With initial values 0 (O) , £ (0) , /3 (0) , o- 2(0) , r (0 >, T (0) the Gibbs sampler repeatedly 
samples Z, 0, £, (3, T, a 2 and T from the distributions (14), (16), (18), (19), (20), (21), (22) 
(in that order). The values of the initial estimates are important for the rate of convergence. 
When poor initial values are chosen, convergence will be very slow. Consider, for example, 
formula (16). When the parameters of the multilevel model are estimated conditional on poor 
estimates of 0, the poor estimates of the multilevel model parameters will subsequently produce 
poor estimates of the ability parameters. This is because, in step 2 the prediction of 0 from the 
multilevel model will dominate the sampled values of 0 when the level 1 residual variance a 2 
is smaller than the variance of 0 , that is, v. So after some iterations, all the sampled values 
of the parameters are far away from the optimal parameter values, while a 2 remains smaller 
than v. It will take a lot of iterations to alter this pattern. Therefore, the following procedure 
can be used to obtain better initial estimates. First, MML estimates of the item parameters are 
made under the assumption that 0 is normally distributed with fi — 0 and cr = 1 (see, Bock 
& Aitkin, 1981; Mislevy, 1986). Another suggestion might be to compute initial values using 
a distinct ability distribution for every subgroup j. These estimates can be computed using the 
program Bilog-MG (Zimowski et al., 1996). Then, using draws from the normal approximation 
of the standard errors of the parameter estimates of Bilog-MG as starting values, the MCMC 
procedure of Albert (1992) for estimating the normal ogive model can be run. That is, with the 
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assumption that 0 is standard normal distributed formula (16) becomes 



@ij | ijk i £ 



N 



f Qij/v 1 \ 

y l/v + 1 ’ l/v + 1 J 



(23) 



and Z, 0 and £ can be sampled from the distributions (14), (23), (18). As the Gibbs sampler has 
reached convergence compute the means of the sampled values of (Z, 0, £) to start sampling 
from the distributions (19), (20), (21) and (22). After convergence, means of the sampled 
values of (/3, T, cr 2 , T) are used as initial estimates. It is also possible to use an EM algorithm 
for estimating (/3, T, cr 2 , T) with the 0 (see, for instance Bryk & Raudenbush, 1 992). Once all 
initial values are estimated, equation (23) can be replaced by (16), and the complete seven-step 
MCMC procedure can be started for a simultaneously estimation of (Z, 0, £, /3, I\cr 2 , T). 

An important point is that the prior distributions for 0 in the MML model and the 
multilevel model have different means and variances. Therefore, the transition between the 
two models must be accompanied by introducing identification constraints which are practical 
in both models. This can, for instance, be accomplished by identifying both models by setting 
Y\ k a k = 1 and£ fc /?fe = 0 . 



Measurement Error on the Predictor Variables 

In this paper, measurement error in explanatory variables will be modeled by 
introducing an IRT model for the item responses related to these explanatory variables. First, 
measurement error on level 1 predictors will be considered. Assume that the latent variables C q ij 
are related to observable variables X^, (q = 1, . . . , Q) via a normal ogive IRT measurement 
model. In this case X qij = (X qi ji , . . . , X q ijK q )\ with realization (x^i, . . . , , denotes 

a response vector on a test with K q items. Notice that predictor X q ij has been replaced by a 
vector of item responses x^j. The posterior distribution of the parameters, (13), now becomes 

p (z, e , £ , /V 2 , r, t, C | Y, x, w) = p (z, 0, L Pc' 1 , r, t | Y, c, w) P (C I x) , (24) 

where p (£ | X) is the posterior of £ given X, that is, 

p(C|X)ocp(X|C)p(C). (25) 

In (25), p (X | C) is an IRT model and p (£) is a prior distribution. Because it is not realistic to 
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assume that that the predictor variables are independent, (25) entails a multivariate IRT model. 

Before the actual parameters £ will be identified, consider a parametrization £*. Let 
be the vector of latent predictor variables for a person indexed i and j, that is, Q- has 
elements Cqij* Further, suppose that for every predictor a two-parameter normal ogive model 
holds, that is, P (X qijk = 1 | Q;, a* fc , <5* fc ) = $ {a qk C q ij ~ 6 qk ), where r K,k and 6* k are item 
parameters of an item of predictor q. Because the predictor variables Q* qi - are considered 
dependent, it will be assumed that Qj has a multivariate normal distribution with mean zero and 
covariance matrix E*. However, the parametrization £* can be transformed to a parametrization 
C such that C has a multivariate normal distribution with mean zero and covariance matrix I, 
that is, the variables ^ qij become independent. Under this transformation, the normal ogive 
model transforms to 



P ( Xqijk — 1 | Cij) fiqk) — ^ (&kCij ^qk) j 



that is, every item response now depends on all latent dimensions. This gives rise to the 
following procedure. 

To sample from (24), the above seven-step procedure can be used to sample from 
p(Z,0,£,/3,cr 2 } r,T | Y,C,W), the only difference is that X is replaced by £, Further, 
sampling from p (£ | X) precedes with a multivariate version of the procedure by Albert (1992). 

So analogous with the above procedure, a random vector V = (Vnn> • . . , 
is introduced, where V q ijk ~ N (<*kCij — and it is supposed that X qi j k = 1 when 

Vqijk > 0 and X qi jk = 0 otherwise. After deriving the fully conditional distributions, the 
Gibbs sampler can again be used to simultaneously estimating the posterior distributions of all 
parameters. 

Step 8: Sampling V. This step is completely equivalent to step 1, so V q ijk, given and £, is 
independent with 



f N (otkCij - Sqh, l) truncated at the left by 0 if X qijk = 1 
\ N { (XkCij ~ f>qk, 1) truncated at the right by 0 if X qijk = 0. 



(26) 



Step 9: Sampling £ Let be the vector with latent predictor variables for a person indexed 
i and j. These are the regression coefficients in the normal linear model 




V"*j 4* S — "F £ ij j 
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where V = (Viyi, . . . , VQ ijKQ )\ S = (6 n , ■ ■ ■ ,Sqk q ) , C ij - (Cuj, ••• , Gc}ij) and A is 
a | YL q K q x Q^j matrix with elements ctk q . Furtermore, the vector £ t] has elements e q ijk, 
(q = 1 , . . . , Q) and (k = 1 , . . . , K q ) , which are independent and standard normally distributed. 
Using the fact that has a multivariate standard normal prior, it follows that 

Cij ~ n ((i + mt 1 ) -1 (i + V ' 1 )-') . (27) 

with C tj = (A 1 A) -1 A £ (V tJ + 6) and * = (A £ A) _1 . 

Step 10: Sampling £ qk . £ qk = ( a k , S qk )\ q = (1, . . . , Q) and k = (1, . . . , K q ) . Given C, the 
= (V qllk , Vqnjjk) 1 satisfies the linear model 

V,fc = [ C “1 ] iqk + e qk (28) 

where C = (Ci, - - - , C q) with C, = (C,n> • • • . ■ The e qk = (e qU k, ■ ■ ■ , e^jk)* is 
a random sample from N (0, 1). Combining the prior for p (£ qk ) = n^=o 7 ( a fc<? > 0) with 
equation (28) gives 



Q 

d qk I C, v* ~ N (l qk , (H £ H) - 1 ) n 7 (a kq > 0) , 

<?=0 

where H = [ C — 1 ] and £ qk is the least square estimator based on (28). 

The model is identified by specifying a multivariate standard normal prior for £. 

A Numerical Example 

In this section, a data set generated with a multilevel IRT model will be analysed. The 
data are simulated using a multilevel model with one explanatory variable without measurement 
error on both levels. The model is given by 

Oij = Poj + 0ijXnj + e l3 (29) 

0o j — Too + 7oi W i 0 j + u 0 j 
0i j = 7io T Tn^iij + w-ij, 



with 6ij ~ 7V(0,cr 2 ) and u q3 ~ N (0, r 2 ) . Response patterns are generated according to a 
normal ogive IRT model for a test of K = 20 dichotomous items with item parameters a = 1 
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and S sampled from N (0, .5). The ability parameters of 2, 000 students are divided over J = 10 
groups of rij = 200 students each, and generated with the multilevel model given by (29). In 
this ijiultilevel model is To = .1, T\ = .1 and a = .2. The explanatory variables X and W are 
drawn from N (0, 1) and N (1/2, 1), respectively. 

The convergence of the Gibbs sampler will be checked by monitoring the expected 
a posteriori estimate of each parameter and its standard deviation (sd) for several consecutive 
sequences of 1 000 iterations of the Gibbs sampler. The Gibbs sampler has reached convergence 
if differences are small. 

The sample variance of the estimates will underestimate the variance of the 
posterior mean due to the positive autocorrelation. Subsampling from a Markov chain 
to get approximately identical independent subsamples only results in poorer estimates, 
(MacEachem and Berliner, 1994). A computational simple and less naive method of estimating 
is through batching, (Ripley, 1987). The single long run of length n = ml , is divided into m 
successive batches of length Z, with batch means B \, . . . B m . The posterior mean B equals 
m ]C£Li and the variance estimator is 

i rn 

Correlation between the batches will be neglible if l is large enough, and m must be large 
enough to get a reliable estimate of var (Bi). The batch length l must be set in such a way 
that the lag-one autocorrelation of B { is less than .05. A side effect of batching with large m 
will be that each B \ is approximately normally distributed. If m is large enough to make Bi 
approximately independent and normally distributed, the (1 — a) confidence interval for the 
parameter of interest will be of the form 



{ B — t m - i a y/v , B + l.or 



where t m - 1 >0 is the upper a point of a t-distribution with m— 1 degrees of freedom. 

In the simulation study, 1,000 iterations with 500 bum in iterations were needed 
to compute the initial estimates. After that, 50, 000 iterations were made to estimate the 
parameters of the multilevel IRT model. 

In Table 1, the item parameter estimates issued from the Gibbs sampler and Bilog- 
MG, (Zimowski et al., 1996) are given. With respect to parameter recovery, it can be seen 
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that the Bilog-MG estimates of the discrimination parameter are lower than the simulation 
values. The Gibbs sampler produces higher discrimination estimates than Bilog-MG, because 
two item parameters, ae = 1 and 63 — 0 , were fixed instead of choosing the location (fi — 0 ) 
and scale (<r = 1) of the latent continuum. The values of the difficulty parameters estimated 
with the Gibbs sampler are generally quite similar to those estimated with Bilog-MG. The small 
standard deviations of the Gibbs estimates can be explained by the fact that the method of batch 
means often severely underestimates the true standard deviations, (Geyer, 1992). It is used here 
as a quick and dirty method, more sophisticated methods as the Window Estimator (Carlin & 
Louis, 1996; Geyer, 1992; Ripley, 1987 ) are also available. 

Both procedures represent different approaches to estimating the item parameters: 
Bilog-MG entails computing the posterior mode and in the Gibbs sampler the posterior mean 
is issued as the item parameter estimate. Thus, the symmetry of the distributions created using 
Gibbs sampler is of interest. In the present paper Bilog-MG is only used as a reference for the 
item parameter values. 



Insert Table 1 about here 



Figure 1 presents the posterior densities of a* for four specific items. In each plot of 
Figure 1, two lines are plotted, these reflect the density estimates based on 1, 000 and 50, 000 
simulated values. It can be seen that the first 1, 000 values produced with the Gibbs sampler 
to get initial estimates are quite close. The posterior modes are generally smaller than the 
posterior means because the items are skewed to the right, and appear to have heavy right tails. 
The conclusion is that the a k are not exact normally distributed, as assumed. 



Insert Figure 1 about here 

Table 2 presents the results of estimating the fixed and random effects of the multilevel 
model with HLM for Windows (Bryk et al., 1996) using the normally unknown 6 and the Gibbs 
sampler. Looking at the fixed effects it can be seen that they are generally quite similar. Also 
estimates of the random effects are almost identical for both methods. 



Insert Table 2 about here 




Finally, it is of interest to evaluate whether the multilevel IRT model is an improvement 
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over the usual multilevel model. The model will be less complex without the IRT model, 
but also less precise. It will be shown that using observed scores instead of latent scores as 
dependent variables will result in less precision in parameter recovery. 

The observed mean scores are scaled differently, so first the latent and observed 
variables are normally standardized. After standardization the group means are zero. So all 
there is left to do is modeling the variance among the groups and among students in groups. 
The model, given by formula (29), becomes 



0\ j = 7io + 7iiWiij+«ij> 

with eij ~ N (0, a 2 ) and u q j ~ N (0, r 2 ) . Table 3 presents the results of estimating the 
structural model using standardized latent scores and standardized observed mean scores (sum- 
scores) as dependent variables. 



The parameter estimates of the fixed effects computed using sum-scores differ much 
more from their true values than those computed using latent scores as dependent variables. 
To see this, compare the differences between the estimates of the fixed effects in Table 2 with 
the differences in Table 3. Also, the difference between the level 1 variance is much greater 
compared to the one in Table 2. An analogous difference can also be seen with the intraclass 
correlation coefficient. This coefficient expresses the proportion of variance in 6 accounted for 
by group-membership, that is, 



From Table 2 it can be seen that using the estimates from HLM results in p 0 = .300 and using 
the estimates from Gibbs sampler results in /3 0 = .336. In Table 3 it can be seen that using 
the standardized latent scores results in /3 0 = .327 and using the standardized observed scores 
results in p 0 — -199. This shows that using observed scores instead of latent scores leads to 
faulty parameter estimates and interpretation of the multilevel model. 



9ij = Poj + PijXuj -F 
Pq j ~ u Qj 



(30) 



Insert Table 3 about here 
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Discussion 

In the present article, it was shown that the Gibbs sampler can be used to 
simultaneously estimate all the parameters of the multilevel IRT model. Obtaining the marginal 
posterior distributions by integrating over all the unknows is highly impractible when the joint 
posterior is of high dimension. The method presented is very powerful because their are no 
limitations on the number of parameters or the number of explanatory variables. Even when 
there are many explanatory variables with measurement error, it is still a matter of sampling 
from all fully conditional posterior distributions. Although good initial values will speed up 
convergence, there are still many iterations necessary for a reliable estimate of all parameters. 
Further research will concentrate on the use of a Monte Carlo EM (MCEM) algorithm to limit 
the amount of iterations (Wei & Tanner, 1990). 

It is easy to incorporate different types of prior beliefs about the item parameters £. 
The example illustrates that the posterior density of the discrimination parameters appears 
to have heavy tails. Therefore it could be interesting to use a log-normal prior for the 
discrimination parameters (Mislevy, 1986). It is also possible to incorporate different priors 
for 7 , a 2 or T. In this paper Jeffreys’ prior is used for the variance components, that is, 
p(cr 2 ) oc a" 2 , p(r) oc t _1 . Jeffreys’ prior for r is potentially a problem in cases where J is 
small (Morris, 1983; Rubin, 1981 ). 

The Gibbs sampling formulation presented in this article can be extended to settings 
in which the fixed effects are distributed with heavy tails (Seltzer, 1993), to study the extent to 
which posterior means and intervals change as the degree of heavy-taildness assumed increases. 

Finally, this article has concentrated on inferences that assumes that the model is 
correct. The problem of model criticism is rather difficult. Posterior predictive checks has the 
problem that (predictive) data has to be generated from the estimated normal ogive IRT model, 
in order to compare the data, Y, with the posterior predictive values. When prior information 
is weak there are difficulties with the use of Bayes factors. And the problem is not just the 
use of improper prior distributions. O’Hagan (1995) showed that Bayes factors are inherently 
sensitive to errors of specification of prior distributions. Promising in this regard is the use of 
Fractional Bayes Factors in a hierarchical model, mentioned by Gilks (1995). 



0 
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Table 1 . Item parameter estimates of the normal ogive IRT model using Bilog-MG and the 
Gibbs sampler. 



Bilog-MG 



Gibbs sampler 



em at sd 


1 


.816 


.068 


2 


.911 


.078 


3 


.856 


.070 


4 


.818 


.067 


5 


.838 


.071 


6 


.942 


.085 


7 


.767 


.063 


8 


.758 


.063 


9 


.837 


.072 


10 


.824 


.070 


11 


.755 


.062 


12 


.878 


.073 


13 


.876 


.071 


14 


.919 


.076 


15 


.869 


.076 


16 


.886 


.075 


17 


.813 


.069 


18 


.823 


.069 


19 


.792 


.072 


20 


.771 


.063 



b k 



sd 






sd 



b k 



-.165 

.978 

-.003 

.042 

-.098 

1.311 

.208 

.316 

-.297 

-.166 

-.239 

.428 

.231 

-.405 

.833 

.688 

.119 

-.111 

-.666 

-.085 



.045 
.063 
.045 
.045 
.045 
.079 
.045 
.046 
.047 
.045 
.044 
.048 
.046 
.049 
.059 
.056 
.045 
. .045 
.053 
.044 



.944 

1.015 

.969 

1.032 

1.006 

.901 

.892 

.876 

.944 

.892 

.938 

.972 

.988 

1.021 

.981 

1.045 

.963 

.965 

.914 

.944 



.045 

.048 

.046 

.048 

.047 

0 

.042 

.041 

.043 

.043 

.042 

.042 

.043 

.045 

.044 

.046 

.042 

.042 

.040 

.041 



sd 



-.218 

.882 

-.101 

-.054 

-.179 

1.026 

.175 

.253 

-.337 

-.300 

-.279 

.337 

.129 

-.483 

.833 

.630 

.051 

-.200 

-.762 

-.154 



.016 

.017 

0 

.017 

.017 

.021 

.015 

.015 

.016 

.015 

.016 

.017 

.017 

.018 

.017 

.017 

.016 

.017 

.016 

.016 
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Table 2. Parameter estimates of the multilevel model, with the Gibbs sampler and HLM for 
Windows. 



Fixed Effect 

700 

701 
7io 
7n 

Random Effect 



UQj 

Uij 



HLM Gibbs sampler 



r 


sd 


r 


sd 


-.237 


.092 


-.114 


.013 


.191 


.122 


.208 


.010 


.352 


.069 


.352 


.013 


1.109 


.144 


1.015 


.048 


T0,Ti,<7 




To,T U a 


sd 


.085 




.103 


.004 


.131 




.140 


.001 


.198 




.203 


.002 
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Table 3. Parameter recovery of the multilevel model with latent scores and observed scores 
as dependent variables. 



Fixed Effect 


HLM 




HLM (sum-scores) 


r 


sd 


r 


sd 


7 10 


.387 


.071 


.505 


.060 


7n 


1.240 


.144 


.793 


.127 


Random Effect 


T0 } T l5 (T 




T0,Ti,<7 




UQj 


.107 




.117 




U\j 


.144 




.108 




Cij 


.220 




.470 
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Figure 1. Posterior densities of a/t for items 2, 5, 1 and 9. Dotted line is an estimate of 
density after 1,000 values, and solid line is an estimate after 50,000 values. 
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